92 research outputs found
Supervised Attentions for Neural Machine Translation
In this paper, we improve the attention or alignment accuracy of neural
machine translation by utilizing the alignments of training sentence pairs. We
simply compute the distance between the machine attentions and the "true"
alignments, and minimize this cost in the training procedure. Our experiments
on large-scale Chinese-to-English task show that our model improves both
translation and alignment qualities significantly over the large-vocabulary
neural machine translation system, and even beats a state-of-the-art
traditional syntax-based system.Comment: 6 pages. In Proceedings of EMNLP 2016. arXiv admin note: text overlap
with arXiv:1605.0314
Research on Feature Extraction of Indicator Card Data for Sucker-Rod Pump Working Condition Diagnosis
Three feature extraction methods of sucker-rod pump indicator card data have been studied, simulated, and compared in this paper, which are based on Fourier Descriptors (FD), Geometric Moment Vector (GMV), and Gray Level Matrix Statistics (GLMX), respectively. Numerical experiments show that the Fourier Descriptors algorithm requires less running time and less memory space with possible loss of information due to nonoptimal numbers of Fourier Descriptors, the Geometric Moment Vector algorithm is more time-consuming and requires more memory space, while the Gray Level Matrix Statistics algorithm provides low-dimension feature vectors with more time consumption and more memory space. Furthermore, the characteristic of rotational invariance, both in the Fourier Descriptors algorithm and the Geometric Moment Vector algorithm, may result in improper pattern recognition of indicator card data when used for sucker-rod pump working condition diagnosis
Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation
Recently CKY-based models show great potential in unsupervised grammar
induction thanks to their human-like encoding paradigm, which runs recursively
and hierarchically, but requires time-complexity. Recursive
Transformer based on Differentiable Trees (R2D2) makes it possible to scale to
large language model pre-training even with complex tree encoder by introducing
a heuristic pruning method. However, the rule-based pruning approach suffers
from local optimum and slow inference issues. In this paper, we fix those
issues in a unified method. We propose to use a top-down parser as a
model-based pruning method, which also enables parallel encoding during
inference. Typically, our parser casts parsing as a split point scoring task,
which first scores all split points for a given sentence, and then recursively
splits a span into two by picking a split point with the highest score in the
current span. The reverse order of the splits is considered as the order of
pruning in R2D2 encoder. Beside the bi-directional language model loss, we also
optimize the parser by minimizing the KL distance between tree probabilities
from parser and R2D2. Our experiments show that our Fast-R2D2 improves
performance significantly in grammar induction and achieves competitive results
in downstream classification tasks.Comment: EMNLP 202
Discover, Explanation, Improvement: Automatic Slice Detection Framework for Natural Language Processing
Current natural language processing (NLP) models such as BERT and RoBERTa
have achieved high overall performance, but they often make systematic errors
due to bias or certain difficult features to learn. Thus research on slice
detection models (SDM) which automatically identifies underperforming groups of
datapoints has gradually caught more attention, which aims at both
understanding model behaviors and providing insights for future model training
and designing. However, there is little systematic research on SDM and
quantitative evaluation of its assessment for NLP models. Our paper fills this
gap by proposing "Discover, Explanation, Improvement" framework that discovers
coherent and underperforming groups of datapoints and unites datapoints of each
slice under human-understandable concepts; it also provides comprehensive
evaluation tasks and the corresponding quantitative metrics, which enable
convenient comparison for future works. Results show that our framework can
accurately select error-prone datapoints with informative semantic features
that summarize error patterns, based on which it directly boosts model
performance by an average of 2.85 points based on trained models without tuning
any parameters across multiple datasets.Comment: 15 pages, 5 figure
The Trickle-down Impact of Reward (In-)consistency on RLHF
Standard practice within Reinforcement Learning from Human Feedback (RLHF)
involves optimizing against a Reward Model (RM), which itself is trained to
reflect human preferences for desirable generations. A notable subject that is
understudied is the (in-)consistency of RMs -- whether they can recognize the
semantic changes to different prompts and appropriately adapt their reward
assignments -- and their impact on the downstream RLHF model.
In this paper, we visit a series of research questions relevant to RM
inconsistency: (1) How can we measure the consistency of reward models? (2) How
consistent are the existing RMs and how can we improve them? (3) In what ways
does reward inconsistency influence the chatbots resulting from the RLHF model
training?
We propose Contrast Instructions -- a benchmarking strategy for the
consistency of RM. Each example in Contrast Instructions features a pair of
lexically similar instructions with different ground truth responses. A
consistent RM is expected to rank the corresponding instruction and response
higher than other combinations. We observe that current RMs trained with the
standard ranking objective fail miserably on Contrast Instructions compared to
average humans. To show that RM consistency can be improved efficiently without
using extra training budget, we propose two techniques ConvexDA and
RewardFusion, which enhance reward consistency through extrapolation during the
RM training and inference stage, respectively. We show that RLHF models trained
with a more consistent RM yield more useful responses, suggesting that reward
inconsistency exhibits a trickle-down effect on the downstream RLHF process
- …